Model Selection

Transformer Architecture

# Transformer Architecture

Wav2vec2 Base Librispeech Demo Colab

This model is a speech recognition model fine-tuned on the LibriSpeech dataset based on facebook/wav2vec2-base, achieving a word error rate of 0.3174 on the evaluation set.

Speech Recognition

Videomae Base Finetuned Ucf101 Subset

Video classification model fine-tuned on a subset of UCF101 based on the VideoMAE base model

Video Processing

X2I is a multimodal diffusion Transformer model capable of converting various input modalities (text, images, videos, audio, speech) into image outputs.

Text-to-Image Other

Latex Finetuned

A Transformer-based optical character recognition model optimized for processing handwritten math images and structured math syntax.

Text Recognition

Unixcoder Code Vulnerability Detector

A C/C++ code vulnerability detection model fine-tuned based on Microsoft's UniXcoder, with an accuracy of 68.34% and an F1 score of 62.14%.

Text Classification

Transformers English

Digitaledutransformers

A Transformer-based tabular classification model for financial data analysis

Text Classification

SnowFlash383935

DNA sequence embedding model based on Transformer architecture, supporting sequence alignment and genomics applications

Molecular Model

roychowdhuryresearch

Finedefics is an open-source multimodal large language model (MLLM) that enhances fine-grained visual recognition (FGVR) capabilities by incorporating object attribute descriptions.

Terjman Large V2.0

Terjman Large-v2.0 is a Transformer-based English-Moroccan dialect translation model with significantly improved performance, comparable to commercial models.

Machine Translation

Transformers Supports Multiple Languages

BounharAbdelaziz

Tabpfn Mix 1.0 Regressor

TabPFNMix is a tabular foundation model pretrained on purely synthetic datasets, utilizing an encoder-decoder Transformer architecture, suitable for tabular data regression tasks.

Materials Science

Tabpfn Mix 1.0 Classifier

A foundational model for tabular data, pretrained on synthetic datasets generated by mixing random classifiers

Molecular Model

Rtdetr V2 R101vd

RT-DETRv2 is a real-time object detection model based on the Transformer architecture, enhanced by an improved baseline model and free optimization tricks.

Object Detection

Pixart Sigma Nitro

AMD Nitro Diffusion is a series of efficient text-to-image models, distilled from mainstream diffusion models on AMD Instinct™ GPUs. PixArt-Sigma Nitro is a high-resolution single-step inference model based on Transformer architecture.

Image Generation

Trocr Base Handwritten Ru

The TrOCR model is a Transformer-based optical character recognition model, specifically fine-tuned for Russian handwritten text.

Transformers Other

Materials.selfies Ted

A Transformer-based encoder-decoder model specifically designed for molecular representation using SELFIES

Molecular Model

Speecht5 Fine Tune En

An English speech synthesis (TTS) model fine-tuned based on Microsoft's SpeechT5, specializing in voice generation for technical domain texts

Speech Synthesis

Transformers English

LWM is the first foundational model in the field of wireless communications, developed as a universal feature extractor capable of extracting fine-grained representations from wireless channel data.

PGTFormer is an image-to-image transformation model based on PyTorch, integrated and pushed to Hugging Face Hub via PytorchModelHubMixin.

Image Generation

Timesformer Base Finetuned K400

TimeSformer is a Transformer-based video understanding model, specifically fine-tuned on the Kinetics-400 dataset.

Video Processing

Segformer B2 Human

A fashion image segmentation model based on the SegFormer architecture, specifically designed for fine segmentation of clothing and accessories

Image Segmentation

Trocr Math Handwritten

TrOCR is a Transformer-based OCR model specifically designed for recognizing handwritten mathematical formulas

Advanced sentence segmentation model based on a 12-layer Transformer architecture, supporting multilingual text segmentation tasks

Sequence Labeling

Transformers Supports Multiple Languages

segment-any-text

MeshAnything is an artist-grade mesh generation model based on autoregressive Transformers, capable of converting images or point clouds into high-quality 3D mesh models.

Dab Detr Resnet 50

DAB-DETR is an improved DETR object detection model that significantly enhances training convergence speed and detection accuracy through dynamic anchor box query mechanism

Object Detection

Transformers English

Block Diagram Global Information

A Transformer architecture model based on the Donut framework, designed to extract overall summary information from block diagram images, supporting English and Korean processing.

Transformers Supports Multiple Languages

RT-DETR is the first real-time end-to-end object detection Transformer model, achieving efficient NMS-free detection through a hybrid encoder and query selection mechanism

Object Detection

Transformers English

MOMENT is a series of general-purpose time series analysis foundation models that support multiple time series analysis tasks, offering out-of-the-box effectiveness and performance enhancement through fine-tuning.

Materials Science

BERTurk-Legal is a Transformer-based language model specifically designed for prior case retrieval tasks in the Turkish legal domain.

Large Language Model

Transformers Other

Segformer B2 Fashion

A fashion image segmentation model fine-tuned based on the SegFormer architecture, specifically designed for identifying and segmenting different apparel categories in clothing images

Image Segmentation

Vsft Llava 1.5 7b Hf Trl

A multimodal vision-language model based on LLaVA-1.5-7B trained through Visual Supervised Fine-Tuning (VSFT), supporting image understanding and dialogue generation

Transformers English

Pix2text Table Rec

A table structure recognition model developed based on Microsoft's Table Transformer for table detection and recognition tasks in documents

Text Recognition

Model Timesformer Subset 02

A video understanding model based on the TimeSformer architecture, fine-tuned on an unknown dataset with an accuracy of 88.52%

Video Processing

Translate Ar En V1.0 Hplt

This is a Transformer-based machine translation model from Arabic to English, trained exclusively on HPLT data.

Machine Translation

Transformers Supports Multiple Languages

Trocr Large Spanish

Transformer-based OCR model for Spanish printed text, optimized for printed fonts and does not support handwriting recognition

Transformers Supports Multiple Languages

Trocr Small Spanish

Spanish printed text OCR model optimized based on Transformer architecture, does not support handwriting recognition

Text Recognition

Transformers Supports Multiple Languages

Table Transformer Structure Recognition V1.1 All

A Transformer-based model for table structure recognition, designed to detect table structures in documents

Text Recognition

Table Transformer Structure Recognition V1.1 Fin

A table structure recognition model based on the DETR architecture, specifically designed for detecting and analyzing table structures in documents.

Text Recognition

Table Transformer Structure Recognition V1.1 Pub

A table transformer model trained on the PubTables1M dataset for table structure recognition in documents.

Text Recognition

Table Transformer Detection Custom Ale

A table detection model based on DETR architecture, specifically designed to identify table regions in documents

Text Recognition

Medical Summarization

A specialized variant based on the T5 Transformer architecture, fine-tuned specifically for medical text summarization tasks, capable of generating concise and coherent summaries for medical documents, research papers, clinical notes, and other healthcare-related texts.

Text Generation

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase